Blog

This new AI jailbreaking technique lets hackers crack models in just three interactions


A new jailbreaking technique could be used by threat actors to gradually bypass safety guardrails in popular LLMs to draw them into generating harmful content, a new report warns.

The ‘Deceptive Delight’ technique, exposed by researchers at Palo Alto Networks’ Unit 42, was able elicit unsafe responses from models in just three interactions.


Source link

Related Articles

Back to top button
close